2,119 research outputs found

    SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer

    Full text link
    Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generative pretrained transformer 3 (GPT-3). The proposed SC VALL-E takes input from text sentences and prompt audio and is designed to generate controllable speech by not simply mimicking the characteristics of the prompt audio but by controlling the attributes to produce diverse voices. We identify tokens in the style embedding matrix of the newly designed style network that represent attributes such as emotion, speaking rate, pitch, and voice intensity, and design a model that can control these attributes. To evaluate the performance of SC VALL-E, we conduct comparative experiments with three representative expressive speech synthesis models: global style token (GST) Tacotron2, variational autoencoder (VAE) Tacotron2, and original VALL-E. We measure word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics to assess the accuracy of generated sentences. For comparing the quality of synthesized speech, we measure comparative mean option score (CMOS) and similarity mean option score (SMOS). To evaluate the style control ability of the generated speech, we observe the changes in F0 and mel-spectrogram by modifying the trained tokens. When using prompt audio that is not present in the training data, SC VALL-E generates a variety of expressive sounds and demonstrates competitive performance compared to the existing models. Our implementation, pretrained models, and audio samples are located on GitHub

    Dynamical mean-field theory of Hubbard-Holstein model at half-filling: Zero temperature metal-insulator and insulator-insulator transitions

    Full text link
    We study the Hubbard-Holstein model, which includes both the electron-electron and electron-phonon interactions characterized by UU and gg, respectively, employing the dynamical mean-field theory combined with Wilson's numerical renormalization group technique. A zero temperature phase diagram of metal-insulator and insulator-insulator transitions at half-filling is mapped out which exhibits the interplay between UU and gg. As UU (gg) is increased, a metal to Mott-Hubbard insulator (bipolaron insulator) transition occurs, and the two insulating states are distinct and can not be adiabatically connected. The nature of and transitions between the three states are discussed.Comment: 5 pages, 4 figures. Submitted to Physical Review Letter

    Pyridoxine induced neuropathy by subcutaneous administration in dogs

    Get PDF
    To construct a sensory neuropathy model, excess pyridoxine (150 mg/kg s.i.d.) was injected subcutaneously in dogs over a period of 7 days. During the administrations period, the dogs experienced body weight reduction and proprioceptive loss involving the hindquarters. After pyridoxine administration was completed, electrophysiological recordings showed that the M wave remained at a normal state, but the H-reflex of the treated dogs disappeared at 7 days. The dorsal funiculus of L4 was disrupted irregularly in the axons and myelin with vacuolation. The dorsal root ganglia of L4, and sciatic and tibial nerves showed degenerative changes and vacuolation. However, the lateral and ventral funiculi of L4 showed a normal histopathologic pattern. Although this subcutaneous administration method did not cause systemic toxicity and effectively induced sensory neuropathy, this study confirmed the possibility of producing a pyridoxine-induced sensory neuropathy model in dogs with short-term administration

    The Characteristics of Metallo-β-Lactamase-Producing Gram-Negative Bacilli Isolated from Sputum and Urine: A Single Center Experience in Korea

    Get PDF
    Metallo-β-lactamase (MBL) production usually results in high-level resistance to most β-lactams, and a rapid spread of MBL producing major gram-negative pathogens is a matter of particular concern worldwide. However, clinical data are scarce and most studies compared MBL producer (MP) with MBL non-producer (MNP) strains which included carbapenem susceptible isolates. Therefore, we collected clinical data of patients in whom imipenem-nonsusceptible Pseudomonas aeruginosa (PA) and Acinetobacter baumannii (AB) were isolated from sputum or urine, and investigated MBL production and the risk factors related with MBL acquisition. The antimicrobial susceptibility patterns were also compared between MPs and imipenem-nonsusceptible MNPs (INMNP). Among the 176 imipenem-nonsusceptible isolates, 12 MPs (6.8%) were identified. There was no identifiable risk factor that contributed to the acquisition of MPs when compared to INMNPs, and case-fatalities were not different between the two groups. The percentage of susceptible isolates was higher among MPs for piperacilin/tazobactam and fluoroquinolones while that of ceftazidime was higher in INMNPs (p < 0.05). As regards to aztreonam, which has been known to be a uniquely stable β-lactam against MBLs, susceptibility was preserved in only two isolates (16.7%) among MPs, and was not higher than that of INMNPs (23.2%). In conclusion, the contribution of MBLs to imipenem non-susceptibility in PA/ABs isolated from sputum and urine was relatively limited, and there was no significant risk factor associated with acquisition of MPs compared with INMNPs. However, limited susceptibility to aztreonam implies that MPs may hold additional resistance mechanisms, such as extended spectrum β-lactamases, AmpC β-lactamases, or other non-enzymatic mechanisms

    A Hybrid Channel Estimation Scheme for OFDM Systems

    Get PDF
    Accurate channel information is indispensable for coherent reception of OFDM signal. Although a Wienertype channel estimation filter (CEF) is known optimum, it is not easily employable due to large implementation complexity. In practice, a moving average (MA)-type CEF is often employed, but it may not provide robust performance to the variation of channel condition. In this paper, we propose a hybrid CEF that takes advantages of both the Wiener and MA CEF, by alternatively employing the CEF according to the channel condition. Simulation results show that the proposed hybrid CEF scheme provides near optimum performance, while significantly reducing the implementation complexity compared to the long tap Wiener CEF
    corecore